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ABSTRACT 

Two  statistical  prediction  models  for  the  24-hour  surface  pressure 
change  are  developed.     One  model  employs  the  terms  in  a  dynamic 
model  as  the  independent  variables  in  a  linear  regression  equation. 
The  other  model  combines  these  variables  with  parameters  capable  of 
reflecting  the  long- wave,   long-term  influences  in  a  multivariate 
discriminate  analysis.     The  regression  equations  were  developed  from 
data  taken  from  the  month  of  November  1962  at  50N  latitude.     A 
discussion  of  the  results  of  both  methods    is     presented  along  with  a 
critique  of  the  procedures  used  in  obtaining  the  data. 
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1.     Introduction 

Numerical  techniques  for  the  sea-level  pressure  prognosis  have 
not  shared  the  same  success  as  those  used  for  forecasting  the  500-mb 
contour  map.     Methods  presently  in  operational  use  consist  in  general 
of  a  500-mb  prognosis  coupled  with  prognosis  of  thickness  using  some 
form  of  the  first  law  of  thermodynamics.     Methods  similar  in  nature 
have  been  suggested  by  Haltiner  and  Hesse   11958]    and  Reed  |_  19 5 6j   . 
In  addition,   it  has  become  standard  practice  with  such  units  as  the 
U.  S.  Navy  Fleet  Numerical  Weather  Facility,   Monterey,   California, 
(FNWF)  to  superimpose  certain  empirical  corrections  on  the  location 
and  intensity  of  cyclones  and  anticyclones.        The  inadequacy  of 
numerical  methods  to  predict  cyclogenesis  and  anticyclogenesis  is 
perhaps  the  greatest  deficiency  of  the  presently  operational  numerical 
techniques       This  study  was  undertaken  to  help  eliminate  some  of 
these  problems. 

Two  models  were  developed:  (1)  a  model  using  solely  dynamic 
parameters  in  the  form  of  a  multiple  linear  regression  equation;  and 
(2)  a  stepwise  multivariate  regression  analysis  using  the  "dynamic" 
predictors  as  well  as  certain  other  parameters  selected  to  reveal 
long-term  and  long-wave  influences  on  the  surface  pressure  change. 
For  convenience  the  two  models  are  hereafter  referred  to  as  "the 
dynamic  model"  and  "the  statistical  model",    respectively.     Separate 
and  detailed  descriptions  of  the  two  models  are  presented  in  the 
following  sections . 
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Latitude  50N  in  early  winter  months  was  chosen  as  a  latitude 
offering  typical,   if  not  difficult,   forecast  problems  for  the  investigation. 
The  dependent  data  were  taken,   insofar  as  possible,  from  30  consecutive 
days  beginning  in  late  October,   1962.     Early  December  of  the  same 
year  provided  five  days  of  independent  data.     Twenty-four  geographical 
locations,  at  intervals  of  15  degrees  of  longitude  around  the  latitude 
belt  were  chosen  as  "stations"       The  "stations"  were  arbitrarily 
numbered  from  1  to  24,    starting  at  ocean  station  "Papa"  in  the  Gulf  of 
Alaska,   in  an  eastward  direction. 

The  data  used  in  both  models  were  taken  from  numerical  computa- 
tions and  printout  charts  prepared  by  use  of  the  CDC  1604  electronic 
digital  computer.     The  programs  and  data  tapes  necessary  to  compute 
and  print  the  charts  were  supplied  by  FNWF .     All  statistical  computa- 
tions were  made  on  the  same  computer  using  selected  programs  from 
the  BIMD    Fortran  library. 

The  techniques  employed  in  this  work  were  not  designed  to  supplant 
the  numerical  methods  now  in  use.     This  effort  was  conducted  so  as 
to  illuminate  some  of  the  factors  not  now  considered  that  may  be  in- 
fluential in  causing  a  sea-level  pressure  change.     Time  steps  of 
twenty-four  hours  were  attempted  in  connection  with  both  models  in 


A  compilation  of  statistical  electronic  computer  programs  that  were 
compiled  and  edited  by  the  Biology  and  Medical  Department  of  the 
University  of  California,   Los  Angeles.     This  manual  is  distributed 
through  the  UCLA  campus  bookstore. 


order  to  test  the  efficacy  of  this  longer-than-normal  time  step  in 
conjunction  with  statistical  methods, 


2.   A  derivation  of  the  dynamical  prediction  equation 

The  following  development  is  taken,   after  Reed    [J962J  ,  with  some 
modifications  and  simplifications  in  the  later  stages  of  the  derivation. 
Pertinent  remarks  will  indicate  where  these  modifications  are 
applicable. 

The  frictionless  vorticity  equation  for  the  1000-mb  level  may  be 
well  approximated  by 

^a+fu-v-  v(x+*)+f($^>o .     (1) 

OL. 

If  we  assume  a  parabolic  vertical  velocity  profile  between  the 

surface  and  500  mb  (subscript  5)  of  the  form 


CO  —  LOo  +•  (<^5  ~  LO°) 


z 


(2) 


and  substitute  for     /  g  W  )  in  (  1 ) ,  we  obtain 

\Tf)o 

A(X+f)  =  -V-VfX+-f:)-^£r^5-u;0)  .      (3) 

The  geostrophic  wind  is  used  to  approximate  the  vorticity  and, 
following  Reed,   the  1000-mb  contour  pattern  may  be  regarded  as 
consisting  of  a  set  of  equally- spaced  circular  highs  and  lows  of  the 
form 

2.  =  ~fbJJu  +  B«3tr\2iTXSIia.2jTM  (4) 

°  g       J  U  U   J 

superimposed  on  a  constant  zonal  current  U.     Here  x  and  y  are  the 
eastward  and  northward  distance  elements,    respectively,  and  the 
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absolute  constant  resulting  from  the  integration  of  U  along  the  meridian 
is  taken  as  zero.    Then  the  1000-mb  relative  vorticity  becomes 


Substituting  (5)  into  (3)  yields  the  equation 

AR^'-fctf, ) ="  Vo  ■  Vte+G  LUTu  )-£f  (J£jg J 

-        '      f  (6) 

8ttx3     ;  9       J 

The  particular  form  of  the  adiabatic  thermodynamic  equation 
used  here  is 


(7) 


With  the  assumption  of  a  linear  geostrophic-wind  hodograph  in  the 
layer  between  1000  mb  and  500  mb,  equation  (7)  may  be  integrated 
with  respect  to  p  from  1000  to  500  mb  to  give 


Next  multiply  (8)  by 

JL'=  -a    ,-f y  ,  (9) 

**    ^j^?^  '  <9) 


.- 


a  slowly  varying  parameter  which  will  be  regarded  here  as  constant, 
and  add  the  modified  version  of  (8)  to  (6)  giving 

Use  of  the  kinematic  boundary  condition  OO^^Xfa^ Vir fa     allows  the 
vertical  velocity  tO&   to  be  included  within  both  brackets  of  (10)  as  a 
terrain  effect.     This  last  effect  is  not  considered  in  this  paper,    since 
it  is  not  one  of  the  dynamic  factors  explicitly  selected  for  the  statis- 
tical regression  employed  in  this  study.     Hence  we  are  left  with  the 
result 

^^H^^-Zfj^-X^o+H'tX  (2s-ZS\  (n) 

as  the  prediction  equation. 

Equation  (11)  is  now  rearranged  to  give 


or 


(13) 


We  next  make  use  of  the  principle  first  introduced  by  Fjortoft 
|1952|    and  extended  by  Reed    fl962J     ,   of  employing  an  equivalent 
advecting  wind  which  gives  the  same  instantaneous  advection  as    )^e 


'I 


but  which  has  the  property  of  changing  more  slowly  with  time.     This 
concept  has  proved  to  be  particularly  valuable  in  graphical  integration 
of  the  dynamical  equations  where  long  time  steps  are  employed. 
Thus  (13)  may  be  written  in  the  equivalent  form 


UM5-2e->H]  =  -Ve.VtJiZs~Z0+H'] 


or,  alternatively  as 


where 


(14) 


V0-*X9V2,,  V£=ftx5?Z£  ,  ££=[14-+f/J.  <"> 


3.    Simplification  of  the  dynamic  model 

In  Reed's  model  the  scalar  field  of  z-  whose  gradient  defines  v£ 
has  a  term  (-M)  representative  of  terrain  effects  in  addition  to  those 
in  equation  (16)  .     As  already  noted,   in  approaching  the  statistical 
application  of  the  dynamic  model  we  have  set  M=0. 

A  word  of  discussion  regarding  Reed's  function  G  and  our  function 
H  is  appropriate.     From  equations(6)  and  (9),   G  is  expressible  as 

Reed  uses  a  mean  value  of  k=0.55  or  k=1.22  for  wave  number  N=6 
in  connection  with  his  12-hr  Lagrangian  prediction  technique.     It  may 
be  shown  that   (3  =  l**-^— ,__, -  ,  J  ( 3 lhZ4>)  and  hence  G  is  an  increasing 
function  of  latitude  up  to      ff>    =  45N.     For  45  K.  <p  <T  55  degrees  which 
is  typical  of  the  range  of  latitude  encountered  in  this  study,   G  is  a 
slowly  decreasing  function.      The  mean  northward  gradient  of  the  G- 
field  centered  at  latitude  50N  over  a  10-degree  span,   is  such  that 
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WY,~  C5  knots  in  an  eastward  direction. 

Recall  that  our  function  H  is  given  by    Gj~"      §       ^   %     anc*  using 
(l+k)  =  2    22     it  follows  that  the  term  involving  U  is  equivalent  to  a 
mean  maximum  zonal  wind  (according  to  Namias  and  Clapp|l95lJ   ) 
of  2  knots.     In  lower  latitudes  the  effects  of  the  G  and  U  terms  are  of 
opposite  algebraic  sign  whereas  in  the  latitude  belt  of  this  discussion 


the  gradient  of  H  is  equivalent  to  an  eastward  wind  of  5  knots.  Con- 
sequently the  H-field  in  (16)  has  been  neglected  relative  to  kz  in  the 
Zg      field.     With  this  simplification  the  prediction  equation  (15)    becomes 


2Z0     =  VE.VK.  +  4+(-feZ5)  (17) 

since  hxz_-z 
5      o 

The  last  term    &,  Uf&L$)     *n  (17)  may  be  obtained  by  employing  the 
graphical  prediction  technique  for  the  barotropic  model  as  described 
in  Haltiner  and  Martin     [1957     pp.    395-398/    .     If  z      represents  the 
space-mean  500-mb  height,   the  Fjortoft  method  leads  to  the  result 

at  the  level  of  nondivergence  (assumed  to  be  500  mb) .     Fjortoft's 
space-mean  advecting  wind     \y  5-4JT         *s  given  by 


We  therefore  have  the  familiar  result 


(20) 


The  Fjortoft  graphical  treatment  makes  use  of  the  following  function 
for  J:  l 

o  *• 


Petterssen   (1956,   pp.    392J    gives  the  distribution  of  the  J-field 
based  upon  a  grid  mesh  d=1000  km.     At  latitude  50N,   the  gradient  of 
J     which  he  shows     is  equivalent  to  an  easterly  geostrophic  wind  of 
8.5  knots.     In  regions  of  maximum  advection  of  (z-z+J),   it  appears 
reasonable  on  the  basis  of  a  statistical  approach  to  approximate  the 
500-mb  advecting  space-mean  wind    v^-fT  by  V^»     where  vg        is 

the  space-mean  geostrophic  wind  at  500  mb.     Moreover,    since  our 
"stations"  are  far  apart  and  confined  to  the  50N  latitude  circle,   the 
working  hypothesis  made  in  the  following  statistical  analysis  is  that 
&§-  gives  only  "feedback"  contributions  to  the  term  v^«^C^~^"r*7J 

For  simplicity  the  1000-mb  height  change  resulting  from  (15)  and 
(18)  becomes 

^Ho=:^VK-fJtVfVC^-Z5^J).  (21) 

Note  that  since  y<g.*=:Jk.  Vjjl     it  follows  that  for  24-hr  advection  it  is 
appropriate  to  consider  the  first  term  on  the  right  side  of  (21)  as  a. 
reduced  advection.     Furthermore  the  vector  Jft.  v/5      in  (21)  has  also 
been  treated  as  the  reduced  advecting  wind       w£         >    since   an  inter- 
pretation of  this  kind  has  been  found  useful  in  "advecting"  the  movement, 
of  rain  areas  associated  with  progressive  sea-level  cyclones  by 
Renard    [_1959J     .     We  have  noted  that  Reed  gives  the  value  k=0    55 
as  appropriate  to  a  Lagrangian  forecast  technique  with  12-hr  time 
increments.     In  our  computations  k  was  rounded  off  to  0  .  5  in  view  of 
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the  subjectivity  of  hand  measurements  of  the  space-mean  geostrophic 
wind  v'£  ■  Hence  with  these  simplifications  our  prediction  model 
now  becomes 

Here  the  regression  coefficients  A    ,   A  ,   A     are  introduced  as  un- 
knowns in  order  to  absorb  any  statistically-determined  feedback 
relationships  contained  in  the  preceeding  analysis,    such  as  for  example, 
the  assumption         JkV^yV [t^s^Jh  V^Vfe-f^+J)  . 
In  equation  (22)  the  best-fit  determination  of  A    ,   A  ,   and  A     will 

OX  uj 

be  made  by  least  squares.     In  essence,   however,   this  equation  presents 
a  dynamically  formulated  problem  similar  to  that  of  Reed    |1956J      and 
Haltiner  and  Hesse    [1958J     . 
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4.    Computational  procedures  for  the  dynamic  model 

The  local  change  of  the  1000-mb  height  in  equation  (22)  is  written  for 
a  24-hr  time  step  and  is  considered  to  be  strictly  proportional  to  the 
24-hr  change  of  sea-level  pressure,   /\p      ,   which  serves  as  the  dependent 
variable.     The  advecting  terms  in  the  working  equation  (22)  are  then 
used  as  the  independent  variables  in  a  multiple  linear  regression 
equation  of  the  form 

Y  =  A.+  A.X.+  A2X, 

(23) 

aPb=  A0+  A,fveVh]  +  Az[-V£'V^5-25+j)]. 

The  advecting  terms  were  evaluated  by  a  quasi- Lagrangian  technique 
using  fixed-point  computations  at  points  determined  by  trajectory 
tracing.     At  each  of  the  24  stations  for  30  days  an  upwind  point  was 
determined.      This  was  accomplished  by  specifying  a  geostrophic  wind 
from  the  500 -mb  space-mean  height  field  terminating  at  the  station  in 
question  and  then  tracing  the  upwind  trajectory  in  the  contour  channel 
for  a  distance  corresponding  to  24  hours.     In  the  cases  of  difluent 
and  confluent  contours      6  -hour  steps  on  the  initial  chart  were  used  so 
that  more  representative  space-mean  winds  were  available  at  each  step. 

The  parameters  z  and  J  as  used  in  the  vorticity  term  were  computed 
using  a  square  grid  distance  of  782  km.  The  space-mean  500-mb  height 
field  used  to  determine  the  advecting  wind  was  obtained  by  subjecting 
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the  500-mb  heights  to  four  scans  using  a  "smoother"  of  the  type 


where  subscripts  S  and  I  refer  to  smoothed  and  initial  values, 

respectively. 

As  previously  noted,   k      in  Vr;^l5  ,   was  taken  to  be  0.5  and  the 

wind  speed  in  all  cases  was  reduced  by  this  factor.     With  this  choice 

of  advecting  wind,   the  upwind  point  thus  determined  for  fields  of  both 

h  and  (z^-z+J)    was  the  same  for  both  advection  computations.     The 

difference  of  the  values  of  h  and  (z   -z  +J)  from  their  initial  values 

5      5 

over  the  station  was  recorded  as  the  24-hr  advective  change. 

FNWF  data  tapes  served  to  give  printouts  of  the  entire  fields 
for  all  24-hr  forecast  periods  under  consideration,     Values  of     Ap0 
at  the  stations  along  50N  were  then  obtained  by  bilinear  interpolation. 
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5.   Results  for  the  dynamic  model 

The  BIMD  06  program  of  the  BIMD  library  was  used  to  perform  the 
least- squares  regression  analysis  based  upon  the  use  of  equation  (23) 
for  the  dynamic  model.     This  was  accomplished  for  each  of  the  three 
data  stratifications  shown  in  Table  1.     Relevant  statistics  obtained  by 
this  analysis  are  displayed  in  Table  2.     Here  PRV  stands  for  per  cent 
reduction  of  variance,  while  R  is  the  multiple  correlation  coefficient. 

After  analyzing  the  results  obtained  by  stratifying  the  data  it  was 
decided  that  due  to  the  greater  percent  reduction  in  variance  given  by 
data  from  the  fifteen  stations  over  or  near  the  land  areas  that  this 
group  would  be  used  to  develop  the  final  regression  equation.     The 
resulting  equation  was  (land  areas  only): 

Ap9  =  6.59-  o^-\VE-V7)-l.8(-VEV^) .         (25) 

Note  that  the  regression  coefficients  are  negative  indicating  that 
advection  of  higher  values  of   *)      and  of    n    each  cause  a  negative 
contribution  to  the  pressure  change.     This  agrees  with  usual  synoptic 
observations . 

Since  the  assumption  of  a  linear  relationship  between  the  predictors 
and  the  predictandwas  not  necessarily  valid,   scattergrams  of      Ape 
versus  both  independent  variables  were  examined  in  order  to  investigate 
possible  indications  of  a  preferred  relationship.     The  BIMD  27  program 
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as  adapted  to  the  CDC  1604  computer  was  employed  to  plot  these  scatter- 
grams.     The  results  using  thickness  and  absolute  vorticity  advection, 
respectively,   are  shown  in  figures  1  and  2.     Examination  of  these  figures 
shows  that  while  a  linear  regression  between    Apc  and   "Vc'V^ 

is  reasonably  valid,   the  relationship  between  A.  p©    and  •""  VjP'V  ^7 

2 
appears     to  be  non-linear  with  no  obvious  correlation. 

The  simple  correlation  coefficients  found  between    Ap0     and— E..^n 
and—Vs'^    i      were  -.41  and  -.04  respectively  (see  also  Table  2a). 
Note  that  the  percent  reduction  in  variance  attributed  to  the  partial 
correlation  of  the  vorticity  advection  is  insignificant,   indicating  that 
the  thickness-advection  parameter  will  give  equally  good  results  when 
used  alone  as  a  predictor.     This  was  not  due  to  a  significant  correlation 
between  the  variables   *"v£.y  h       and  %Yr?       ,    since  their  simple 
correlation  coefficient  was  only  0.17. 

The  regression  equation  developed  from  these  parameters  was  tested 
on  an  independent  sample  of  75  cases  drawn  from  data  gathered  in  the 
month  of  December  with  the  results  shown  in  Table  2b. 

The  apparent  lack  of  success  of  the  24 -hour  absolute  vorticity 
advection,  as  measured  by  the  method  here  employed,  to  furnish 
significant  predictability  for  the  subsequent  24-hour  pressure  change 


2 

The  symbol   hi       has  been  introduced  to  represent  (  z   -z  +G). 
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was  surprising,   particularly  in  view  of  the  comparative  usefulness  of 
the  thickness  advection.     Some  of  the  primary  reasons  for  these 
differences  in  the  statistical  behavior  of  these  two  predictors  were 
revealed  upon  a  critical  re- examination  of  the  initial-data  fields.      These 
are  discussed  in  subsections  (a),    (b)  and  (c),   below,   with  some  re- 
sulting conclusions  in  (d).  I 

(a)  Advection  of  500  mb  absolute  vorticity.      The  values  of  this  variable 

are  sensitively  dependent  upon  the  field  of  z    -z   +J.     This  field  had  the 

characteristic  of  exhibiting  unusually  strong  gradients  over  short 

distances  near  the  vorticity  centers,   with  relatively  weak  gradients 

elsewhere       Thus  any  cumulative  error  in  constructing  a  24-hour  upwind 

trajectory,   based  on  the  use  of  the  equivalent-advecting  wind     Vr- 

can  give  rise  to  sizeable  differences  in  the  value  of  z    -z_+J  to  be 
6  5      5 

advected.     Reed    [_1962J     ^as  SL^so  referred  to  the  question  of  trajectory 
accuracy. 

(b)  Advection  of  thickness.      The  printed  fields  of    h   =   z    -z       displayed 
_  5      0 

a  smaller  degree  of  non-linearity  over  short  distances  and  were  sub- 
ject to  considerably  smaller  upwind-point  error       The  difference  in 

appearance  of  the  "z    -z  +J    and  z    -z     fields  may  be  attributable  to 
rr  5      5  5      o 

the  smoothing  process  used  in  obtaining  z     whereas  no  smoothing  was 
employed  in  obtaining  the  thickness  field, 

(c)  Approximation     tyC.  -  J^  ty^.      for  advection  of  absolute  vorticity 
In  a  small  percentage  of  cases  of  24-hour  500-mb  advection  the  value 
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of         -4-    LV^    '  V  (Zg-Zs+OyJ  was  not  equal  to  that  using  the 

advecting  wind       s/cr  -  p    y^F     ■      This  occurred  when  a  24-hour 
trajectory  passed  over  an  extreme  value  of  z    -z  +J 
(d)     It  must  be  concluded  that  the  statistical  Lagrangian  Technique 
employed  here  suffers  from  the  defect  of  employing  excessive  time- 
steps.     While  the  procedure  bears  some  similarity  to  the  Fjortoft 
technique,   it  does  not  have  the  advantage  of  scanning  comparative  data 
from  all  latitudes  within  the  grid  map       Hence  the  smoothing  capability 
usually  available  in  most  prognostic  procedures  (and  in  analysis  in 
general)  could  not  be  used  as  a  prognostic  aid  here 
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6.    A  multivariate  linear  regression  analysis  for  24-hour  prediction 
of  sea-level  pressure 

In  this  phase  of  the  investigation  a  total  of  28  independent  variables 
were  tested  as  possible  predictors  in  a  purely  statistical  approach. 
Recent  investigations  by  Ostby-Viegas   [  1960/     ,   Miller      1962  )    ,   and 
others  employing  statistical  reduction  methods  have  utilized  the 
advantages  of  a  computer  to  reduce  large  numbers  of  possible  predictors 
to  a  significant  few.     Such  methods  may  use  either  objectively-determined 
data  with  no  immediate  rationale  for  the  relationship,   or  dynamically- 
based  data  to  select  variables.     Such  significant  multiple  linear  re- 
gressions which  exist  are  found  by  a  statistical  screening  process. 
Once  an  objectively  chosen  variable  has  been  selected  through  the 
screening  process  it  is  usually  possible  to  find  a  caural  relationship 
between  it  and  the  predictand  on  the  basis  of  synoptic-dynamic 
considerations 

In  this  section,   the  intent  was  to  utilize  the  two  predictors  already 
employed  in  the  dynamic  model  (see  equation  (23)) .     Since  the  predictors 
already  chosen  rest  heavily  on  500-mb  parameters  it  was  decided  to 
choose,   as  far  as  possible,   additional  obj  ective  parameters  from  this 
level.     Furthermore,    since  the  month  of  November  1962  was  charac- 
terized by  contrasting  mid- latitude  regimes  in  the  Pacific  and  Atlantic 
Oceans,   with  higher  than  normal  mid-latitude  zonal  flow  in  the  Pacific 
and  anomalous  blocking  action  in  the  Atlantic,   it  was  felt  that  the 
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24-hour  values  of  the  dynamic  (advective)  parameters  might  contain 
considerable  scatter  at  stations  influenced  by  such  contrasting    meteoro- 
logical activity.     Consequently  it  was  decided  to  select  a  limited  number 
of  500-mb  parameters  indicative  of  the  long  wave  features  and,   to  some 
degree,   of  the  extended-period  circulation  anomalies.     Recourse  is 
made  here  to  some  of  the  extended-forecast  concepts  of  Namias|1951|     . 
At  each  station  the  latest  500-mb  height  z       and  those  for  each  of  the 
three  preceeding  24-hour  map  times  were  read  off  or  interpolated  from 
the  contoured  printout  charts.     From  these  data  the  two  sets  of  para- 
meters,  the  two-day  trend    Ti     and  three- day  mean  Zi      were  computed 
for  each  station,   and  a  regression  equation  of  the  form 

APo/u  =  Bio+  Bi,  (-vE.y /?)  +  6LZ  C-VE.V.b)         (26, 

is  sought,   where  j  is  a  15     longitude   interval,   considered  positive 

eastward.     The  subscript  i-2j    ,   indicates  that  we  are  examining  data 

o 
at  all  upwind  points  30     longitude  apart.     Note  that  the  two- day  trend 

at  station  i  is  given  by 

Ti    -  C2c-2-J  o  <27> 

and     /  ,  '  for  simplicity  has  been  defined  here  as  the  simple  arithmetic 
mean  f— 7-  /  v        y 

Z,-=(2c+  ^-i  +2-*  +  Z-3}/4.  (28) 
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TL-Zj        and    Zji-9] 


The  implication  of  the  summation  sign  before  the     I  l-Z.  j        and    /-)  l-74" 
terms  is  that  we  are  considering  all  of  the  large-scale  upwind  and  down- 
wind rates  of  change  and  mean  heights,   in  contributing  predictability 
to     A  pQ       at  the  site  denoted  by  the  subscript     <_     • 

The  variables     U-i    and  Vi     are  the  850-mb  zonal  and  meridional 
geostrophic-wind  components  at  station      C    .     They  are  presented  in 
equation  (23),   firstly,   in  order  to  include  possible  relevant  low-level 
effects,   and  secondly,   in  order  that  a  physically  important  factor  such 
as    U.V      (where  the  superior  bar  indicates  a  zonal  mean)  may  at  least 
be  implicit  within  the  set  of  independent  variables.     The  significance 
of  this  cross- covariance  is  that  it  suggests  effects  of  the  zonal-index 
cycle  (see  Haltiner  and  Martin,!  1957  ,  pp.    446-  448 I  ). 

With  28  possible  predictors  appearing  in  (23),   the  BIMD  09  program 
was  used  to  perform  the  regression  analysis  of  the  statistical  model. 
This  program  is  a  modification  of  one  originally  written  by  M.   A. 
Efroymson  !  1955J    of  the  Esso  Research  and  Engineering  Company. 
The  important  features  of  the  program  include  a  stepwise  screening 
of  the  predictors  using  arbitrary  upper  and  lower  critical  F-values 
as  cutoff  limits  for  inclusion  or  rejection  of  variables. 

The  F  statistic  as  employed  in  this  test  is  the  ratio  of  the  mean 
squares  explained  by  the  regression  to  the  residual  or  unexplained 
mean  squares.     According  to  Anderson   11960j     the  F-ratio  is  the  ratio 
of  two  chi- square  distributed  variables  with  k  and  n-k-1  degrees  of 
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freedom,  where     n    is  the  number  of  cases  in  the  data    sample.     Miller 
]__1962j    suggests  values  for  the  critical  or  cutoff  F-values  for  introducing 
predictors  into  (23).     If  P  is  the  total  number  of  possible  predictors 
(28  in  this  investigation)  and     k    is  the  number  of  predictors  already- 
selected,   his  critical  F-value  is  given  as 


c* 


P~Jk+\ 


(29) 


where  of  —   -^^ .     The  value«K>fc-  =    .05  is  the  usual  critical 

P-vfc+l 

significance  level  in  the  selection  test.     It  is  apparent  that  the  level 
imposed  by  this  method  of  determining  a  critical  F-value  will  decrease 
as  more  predictors  are  chosen. 

Inasmuch  as  the  BIMD  09  screening  program  uses  a  fixed  F-level 
throughout  the  screening  process,   it  seems  desirable  to  perform  several 
regression  analyses  of  the  data  with  differing  F-levels  but  retaining  one 
coinciding  with  Miller's  (F  =  10.0)       In  a  recent  analysis  (Martin  et  al.  , 
[1963])  the  recommendation  is  made  that  a  lower  F-level  for  rejection  of 
a  previously  selected  variable  should  be  taken  as  zero  when  using  the 
BIMD  09  program  with  Miller's  selectioncriterion.     Accordingly, 
predictors  were  arbitrarily  selected  in  this  analysis  with  upper  critical 
F-levels  of  10.0  and  5.0,  and  a  lower  limit  in  each  case  of  zero. 

Selected  parameters  are  shown  in  Table  3  in  the  order  in  which  they 
were  chosen  by  the  analysis  of  the  dependent  data       The  form  of  the  F- 
test  used  to  compute  the  significance  of  the  final  regression  equation 
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is  given  (after  Anderson,  /  I960,  p.89j  )  as  below: 

f  U,  n. -J- -i)  =  f    R\  Vx-J-i\  oo) 

where  R  is  the  multiple  correlation  coefficient.      The  percent  reduction 

2 
in  variance,   R      ,   is  given  by  the  formula 


R'=<-(k) 


Z 

(31) 

<;  '*  ' 

where   Dy    the  reduced  standard  error,   and  ■&*    ,  the  total  standard 
error,   are  available  at  each  step  of  the  BIMD  09  program  printout. 
The  cumulative  percent  reduction  in  variance  was  computed  for 
each  step  of  the  regression  (see  Table  3).     Note  that  using  Miller's 
suggested  initial  F-level  of  10.0  only  two  parameters  are  chosen  by  the 
screening  process.     These  significant  predictors  are    — y-  *  v  n  > 
the  advection  of  thickness  as  employed  in  the  dynamic  model,   and      IL 
the  2-day  500-mb  height  change  over  the  station.     Together,   these 
parameters  give  a  cumulative  percent  reduction  in  variance  of  18.43. 
When  the  F-level  is  lowered  to  5 .  0  four  additional  parameters  are 
selected,    each  of  which  contributes  a  relatively  small  gain  in  PRV. 
The  four  additional  parameters  selected  (in  the  order  chosen)  were 
—  Y£   '   v  i(    or  the  advection  of  z    -z  +  J  as  employed  in  the  dynamic 
model;   (J;      ,   the  geostrophic  component  of  the  850-mb  wind  over  the 
station;      li-£      >  the  2-day  500-mb  height  trend    90       upstream; 
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and    £-4  ^-/o    >  tne  3-day  mean  500-mb  height  150     upstream.     The 
resulting  regression  equation  which  includes  the  six  selected  variables 
(see  Table  3)  is 

AP   =//6. 1  -l.6(-VE-Vk)  -.ls(-%-m) 

\  _  (32) 

-28l-.5'4ul-a/i-6-/4Zi-w. 

While  the  computed  F-levels  indicate  that  these  last  predictors  are 
statistically  significant  it  is  possible  that  such  correlations  as  are 
indicated  arise  from  "noise"  and/ or  erroneous  data.     Under  such  circum- 
stances the  regression  equation  may  tend  to  "overfit"  the  sample.     A 
discussion  of  this  effect  is  given  by  Panofsky  and  Brier    [1958,  p.l76J  . 

The  existence  of  "overfitting"  of  the  dependent  sample  is  demonstrated 
by  the  instability  of  the  regression  equation  when  applied  to  the  independent 
data.     This  phenomenon  may  evidence  itself  by  a  substantial  decrease 
("shrinkage")  in  the  percent  reduction  of  variance  explained  with  the  in- 
dependent sample.     For  example,   when  equation  (32)  was  tested  on  an 
independent  sample  of  75  cases,   a  PRV  of  10    6  percent  occurred,   compared 
to  22.04  percent  for  the  dependent  sample.     Although  the  "shrinkage"  here 
is  large,    some  stability  of  the  final  equation  is  indicated  and  an  improve- 
ment shown  over  the  dynamic  model. 

From  practical  considerations  we  wish  to  use  only  the  most  efficient 
predictors.     Those  which  offer  little  improvement  in  predictability  (by 
the  added  percent  reduction  in  variance  criterion)  are  chiefly  of 
theoretical  interest.     Undoubtedly  the  most  practical  prediction 
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equation  giving  the  least  overfit  would  involve  the  two  parameters 
selected  in  the  statistical  model  with   /     ^-10   ,   namely,   advection  of 
thickness —Ve^v    h      and  the      2  -day  height  difference    '  I        .     The 
usefulness  of  those  predictors  with  lower  F-levels  is  doubtful,   especially 
when  the  small  percent  reduction  in  variance  achieved  by  their  use  is 
considered. 

Errors  in  data  sampling  have  already  been  referred  to  in  section  6, 
particularly  in  reference  to  the  dynamic  predictors.     Other  errors  are 
map- scale  error  and  interpolation  errors,  both  of  which  are  considered 
to  be  negligible  in  this  instance. 
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7 .    Conclusions 

The  prediction  equations  developed  by  this  investigation  do  not  offer 
a  significant  improvement  to  existing  methods.     This  is  felt  to  be  due 
partly  to  the  limitations  inherent  in  applying  the  statistical  technique  as 
a  linear  operator  and  partly  attributable  to  the  fact  that  single-point 
observations  fail  to  capture  gradient  effects  in  the  manner  of  a  closely 
spaced  grid.     Considering  that  the  chosen  latitude  and  season  indicate 
that  baroclinic  development  can  normally  be  expected,  any  useful 
filter  should  predict  non-linear  effects  in  a  consistent  fashion.     The 
results  obtained  by  applying  the  developed  regression  equations  to  the 
independent  samples  indicate  that  this  is  not  being  done  and  that  we 
must  recognize  some  of  the  shortcomings  of  the  measurement  methods. 
The  most  immediate  improvements  suggested  from  the  results  are: 
(1)  decreasing  the  Lagrangian  time  step;  and  (2)  employing  an  entire 
grid  map  to  verify  actual    Ap0   patterns. 
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Table  1,       Stratification  of  data  for  analysis  of  the  dynamic  model 


Grouping  Number  of  Population  size 

stations  in 
sample 

Individual 

stations  1  30 

Continental 

stations  15  450 

Ocean  stations  9  270 


Total  of  stations  24,  720 
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Table  2.       Summary  of  pertinent  statistics  for  various  sample 
stratifications  for  the  dynamic  model 

(a)       Dependent  data 


Grouping  Sample  Variable  PRV  R 


size 


Final  F 
value 


Total 
stations 


720 


-VEVh 

combined 


.002 
,144 
.146 


382      61.31 


Land 
stations 


450 


-VE  -  V  h 

combined 


002 
179 
181 


426       49.64 


Ocean 

stations 


270 


combined 


004 

130 

134 


366       20.62 


(b)         Independent  data 


Grouping  Sample  Variable  PRV  R  Final  F 

size  value 


Land 
stations 


75 


combined 


01 


.10  <  I 
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Table  3.     Summary  of  results  obtained  from  dependent  data  for  the 
statistical  model 


Predictor  F  level  Percent  Coefficient 

on  entry  reduction  in  in  regression 

variance  equation 

(  F^  5.0) 


-VEVH 

90.6 

16.65 

=1.6 

71 

10.8 

1.78 

=  .28 

-VE  V7 

6.9 

1.06 

-.18 

UL 

6.5 

.98 

=-.54 

1  £q  h 

(90     upstream) 

5.9 

.86 

=  .24 

*-*4  1.-I6 
(150°  upstream) 

5.0 

.71 

=  .14 

Constant 
term 

- 

- 

116.1 

Standard  deviation  of  Apc  ,    s  =    7.484  mb 

R2  =    0.22 

R  =0.48 

F(6,443)  =    20.9 

Fc(.99)  =       2.85 
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